Load all necessary Libraries¶
Function to read txt data files and convert them to proper csv files :
- txtFile: input filename (including directory if applicable)
- csvFile: output filename (including directory if applicable)
- vtabchar: vertical tab character in the original file (to be replaced with newline command '\n')
- delim: delimiter character used in the original file (to be replaced with comma)
¶
Define the file paths¶
Load all relevant Files¶
| X1 | ID_T16.x | Produktionsdatum.x | Herstellernummer.x | Werksnummer.x | Fehlerhaft.x | Fehlerhaft_Datum.x | Fehlerhaft_Fahrleistung.x | ID_T16.y | Produktionsdatum.y | ... | Fehlerhaft.y | Fehlerhaft_Datum.y | Fehlerhaft_Fahrleistung.y | ID_T16 | Produktionsdatum | Herstellernummer | Werksnummer | Fehlerhaft | Fehlerhaft_Datum | Fehlerhaft_Fahrleistung | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 16-212-2121-7 | 2008-11-07 | 212.0 | 2121.0 | 0.0 | NaN | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2 | 16-212-2122-41 | 2008-11-08 | 212.0 | 2122.0 | 0.0 | NaN | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 5 | 16-212-2121-36 | 2008-11-07 | 212.0 | 2121.0 | 0.0 | NaN | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 10 | 16-212-2122-20 | 2008-11-07 | 212.0 | 2122.0 | 0.0 | NaN | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 12 | 16-212-2122-33 | 2008-11-07 | 212.0 | 2122.0 | 0.0 | NaN | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 22 columns
| Unnamed: 0 | X | X1 | ID_Fahrzeug | Herstellernummer | Werksnummer | Fehlerhaft_Datum | Fehlerhaft_Fahrleistung | days | fuel | engine | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9 | 9 | 9 | 11-1-11-9 | 1 | 11 | 2010-03-16 | 34824.319559 | 1493.150761 | 4.003670 | small |
| 1 | 11 | 11 | 11 | 11-1-11-11 | 1 | 11 | 2010-03-16 | 74217.428309 | 1044.462231 | 11.042487 | large |
| 2 | 13 | 13 | 13 | 11-1-11-13 | 1 | 11 | 2010-03-16 | 32230.699639 | 749.669810 | 3.579117 | small |
| 3 | 15 | 15 | 15 | 11-1-11-15 | 1 | 11 | 2010-03-16 | 44885.783551 | 858.688003 | 4.666801 | small |
| 4 | 37 | 37 | 37 | 11-1-11-37 | 1 | 11 | 2010-03-17 | 86348.329866 | 1478.204174 | 4.634381 | small |
| Unnamed: 0 | IDNummer | Wareneingang | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|---|
| 0 | 1 | K7-113-1132-153160 | 2016-11-22 | 112 | 1132 | 0 |
| 1 | 2 | K7-113-1132-153109 | 2016-11-20 | 112 | 1132 | 0 |
| 2 | 3 | K7-113-1132-153195 | 2016-11-20 | 112 | 1132 | 0 |
| 3 | 4 | K7-113-1132-153226 | 2016-11-20 | 112 | 1132 | 0 |
| 4 | 5 | K7-113-1132-153231 | 2016-11-20 | 112 | 1132 | 0 |
| Unnamed: 0 | IDNummer | Gemeinden | Zulassung | |
|---|---|---|---|---|
| 0 | 408097 | 11-1-11-1 | DRESDEN | 01-01-2009 |
| 1 | 408098 | 11-1-11-2 | DRESDEN | 01-01-2009 |
| 2 | 1 | 12-1-12-1 | LEIPZIG | 01-01-2009 |
| 3 | 2 | 12-1-12-2 | LEIPZIG | 01-01-2009 |
| 4 | 3 | 12-1-12-3 | DORTMUND | 01-01-2009 |
| Unnamed: 0 | ID_Karosserie | ID_Schaltung | ID_Sitze | ID_Motor | ID_Fahrzeug | |
|---|---|---|---|---|---|---|
| 0 | 1 | K5-112-1122-1 | K3SG1-105-1051-4 | K2ST1-109-1092-1 | K1BE1-101-1011-90 | 12-1-12-1 |
| 1 | 2 | K5-112-1122-11 | K3AG1-105-1051-9 | K2ST1-109-1092-16 | K1BE1-101-1011-2 | 12-1-12-2 |
| 2 | 3 | K5-112-1122-2 | K3AG1-105-1051-23 | K2ST1-109-1092-21 | K1BE1-101-1011-8 | 12-1-12-3 |
| 3 | 4 | K5-112-1122-3 | K3AG1-106-1061-66 | K2ST1-109-1092-70 | K1BE1-101-1011-11 | 12-1-12-4 |
| 4 | 5 | K5-112-1122-4 | K3SG1-105-1051-8 | K2ST1-109-1092-78 | K1BE1-101-1011-42 | 12-1-12-5 |
| Unnamed: 0 | IDNummer | Produktionsdatum | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|---|
| 0 | 1 | K7-114-1142-1 | 2008-11-12 | 114 | 1142 | 0 |
| 1 | 2 | K7-114-1142-2 | 2008-11-12 | 114 | 1142 | 0 |
| 2 | 3 | K7-114-1142-3 | 2008-11-13 | 114 | 1142 | 0 |
| 3 | 4 | K7-114-1142-4 | 2008-11-13 | 114 | 1142 | 0 |
| 4 | 5 | K7-114-1142-5 | 2008-11-13 | 114 | 1142 | 0 |
| ID_T16 | ID_Komponente | |
|---|---|---|
| 0 | 16-213-2132-44 | K2LE2-111-1111-1 |
| 1 | 16-215-2152-68 | K2LE2-111-1111-2 |
| 2 | 16-212-2121-9 | K2LE2-111-1111-3 |
| 3 | 16-212-2121-16 | K2LE2-111-1111-4 |
| 4 | 16-212-2121-19 | K2LE2-111-1111-5 |
| ID_T16 | ID_Komponente | |
|---|---|---|
| 0 | 16-213-2132-44 | K2LE2-111-1111-1 |
| 1 | 16-215-2152-68 | K2LE2-111-1111-2 |
| 2 | 16-212-2121-9 | K2LE2-111-1111-3 |
| 3 | 16-212-2121-16 | K2LE2-111-1111-4 |
| 4 | 16-212-2121-19 | K2LE2-111-1111-5 |
Fahrzeuge_Merged_df
1. Logistics and Product Development in the Automobile Industry ¶
| Unnamed: 0 | IDNummer | Produktionsdatum | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|---|
| 0 | 1 | K7-114-1142-1 | 2008-11-12 | 114 | 1142 | 0 |
| 1 | 2 | K7-114-1142-2 | 2008-11-12 | 114 | 1142 | 0 |
| 2 | 3 | K7-114-1142-3 | 2008-11-13 | 114 | 1142 | 0 |
| 3 | 4 | K7-114-1142-4 | 2008-11-13 | 114 | 1142 | 0 |
| 4 | 5 | K7-114-1142-5 | 2008-11-13 | 114 | 1142 | 0 |
| Unnamed: 0 | IDNummer | Wareneingang | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|---|
| 0 | 1 | K7-113-1132-153160 | 2016-11-22 | 112 | 1132 | 0 |
| 1 | 2 | K7-113-1132-153109 | 2016-11-20 | 112 | 1132 | 0 |
| 2 | 3 | K7-113-1132-153195 | 2016-11-20 | 112 | 1132 | 0 |
| 3 | 4 | K7-113-1132-153226 | 2016-11-20 | 112 | 1132 | 0 |
| 4 | 5 | K7-113-1132-153231 | 2016-11-20 | 112 | 1132 | 0 |
| Unnamed: 0 | IDNummer | Wareneingang | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|---|
| 0 | 1 | K7-113-1132-153160 | 2016-11-22 | 112 | 1132 | 0 |
| 1 | 2 | K7-113-1132-153109 | 2016-11-20 | 112 | 1132 | 0 |
| 2 | 3 | K7-113-1132-153195 | 2016-11-20 | 112 | 1132 | 0 |
| 3 | 4 | K7-113-1132-153226 | 2016-11-20 | 112 | 1132 | 0 |
| 4 | 5 | K7-113-1132-153231 | 2016-11-20 | 112 | 1132 | 0 |
| IDNummer | Produktionsdatum | Herstellernummer | Werksnummer | Fehlerhaft | |
|---|---|---|---|---|---|
| 0 | K7-114-1142-1 | 2008-11-12 | 114 | 1142 | 0 |
| 1 | K7-114-1142-2 | 2008-11-12 | 114 | 1142 | 0 |
| 2 | K7-114-1142-3 | 2008-11-13 | 114 | 1142 | 0 |
| 3 | K7-114-1142-4 | 2008-11-13 | 114 | 1142 | 0 |
| 4 | K7-114-1142-5 | 2008-11-13 | 114 | 1142 | 0 |
| IDNummer | Produktionsdatum | Issued_Products | Wareneingang | LogisticsDelay | |
|---|---|---|---|---|---|
| 0 | K7-114-1142-1 | 2008-11-12 | 2008-11-13 | 2008-11-19 | 7 |
| 1 | K7-114-1142-2 | 2008-11-12 | 2008-11-13 | 2008-11-19 | 7 |
| 2 | K7-114-1142-3 | 2008-11-13 | 2008-11-14 | 2008-11-20 | 7 |
| 3 | K7-114-1142-4 | 2008-11-13 | 2008-11-14 | 2008-11-20 | 7 |
| 4 | K7-114-1142-5 | 2008-11-13 | 2008-11-14 | 2008-11-19 | 6 |
| ... | ... | ... | ... | ... | ... |
| 306485 | K7-113-1132-153241 | 2016-11-12 | 2016-11-13 | 2016-11-19 | 7 |
| 306486 | K7-113-1132-153242 | 2016-11-12 | 2016-11-13 | 2016-11-19 | 7 |
| 306487 | K7-113-1132-153243 | 2016-11-12 | 2016-11-13 | 2016-11-20 | 8 |
| 306488 | K7-113-1132-153244 | 2016-11-12 | 2016-11-13 | 2016-11-18 | 6 |
| 306489 | K7-113-1132-153245 | 2016-11-13 | 2016-11-14 | 2016-11-20 | 7 |
306490 rows × 5 columns
Interpretation: The histogram and density plot of the sample set show multiple peaks suggesting that the data don't follow a normal distribution¶
Shapiro-Wilk Test: Statistics=0.8907687664031982, p-value=3.433181237595802e-42 The logistics delay does not follow a normal distribution (reject H0).
Interpretation: Both the visual representation and the Shapiro-Wilk Test, indicate that the Logistics Delay does not follow a normal distribution. In this case further testing should be performed.¶
Kolmogorov-Smirnov Test: Statistics=0.7158238172920065, p-value=0.0 The logistics delay does not follow an exponential distribution (reject H0).
Interpretation: The Kolmogorov-Smirnov Test indicate that the Logistic Delay does not follow an exponential distribution.¶
Kolmogorov-Smirnov Test for Gamma Distribution: Statistics=0.21600036314061488, p-value=5.208266151703579e-126 The logistics delay does not follow a gamma distribution (reject H0).
Interpretation: The Kolmogorov-Smirnov Test indicate that the Logistic Delay does not follow a gamma distribution.¶
Kolmogorov-Smirnov Test for Log-Normal Distribution: Statistics=0.2179015832535024, p-value=2.9672178452961654e-128 The logistics delay does not follow a log-normal distribution (reject H0).
Interpretation: The Kolmogorov-Smirnov Test indicate that the Logistic Delay does not follow a log-normal distribution.¶
The results of our statistical tests indicate that the data do not follow a normal, exponential, gamma or log-normal distribution. The Logistics Delay data seem to have a complex distribution that is not captured by these parametric distributions.¶
To get more insight of the data, we can use the Kernel Density Estimation (KDE), to model the distribution without assuming any specific parametric form.¶
The KDE plot shows multiple peaks, reinforcing the indication that the logistics delay is multimodal. Specifically we notice peaks around 6, 7, 8 and 9. Which suggests that there may exist distinct groups (clusters) within the data. ¶
To get a better understanding, we can use unsupervised learning, to identify patterns and structures within the data. More specifically: Cluster Analysis.
¶Cluster 0 count 1281.0 mean 7.0 std 0.0 min 7.0 25% 7.0 50% 7.0 75% 7.0 max 7.0 Name: LogisticsDelay, dtype: float64 Cluster 1 count 638.0 mean 8.0 std 0.0 min 8.0 25% 8.0 50% 8.0 75% 8.0 max 8.0 Name: LogisticsDelay, dtype: float64 Cluster 2 count 838.0 mean 6.0 std 0.0 min 6.0 25% 6.0 50% 6.0 75% 6.0 max 6.0 Name: LogisticsDelay, dtype: float64 Cluster 3 count 233.000000 mean 9.253219 std 0.541629 min 9.000000 25% 9.000000 50% 9.000000 75% 9.000000 max 12.000000 Name: LogisticsDelay, dtype: float64 Cluster 4 count 75.0 mean 5.0 std 0.0 min 5.0 25% 5.0 50% 5.0 75% 5.0 max 5.0 Name: LogisticsDelay, dtype: float64
After performing the Cluster Analysis we can confirm that the distribution is multimodal. More specifically, delays cluster around:
5 days
6 days
7 days
8 days
9 to 12 days
5 days
6 days
7 days
8 days
9 to 12 days
This indicates that the delays aren't spread evenly across a range but are instead concentrated at specific points.
¶b. Determine the mean logistics delay, considering weekends. Interpret this number and discuss possible alternatives.¶
count 306490.000000 mean 7.080437 std 1.012302 min 4.000000 25% 6.000000 50% 7.000000 75% 8.000000 max 15.000000 Name: LogisticsDelay, dtype: float64
The mean logistics delay, considering the weekends, is calculated to be 7.08 days. This number reflects the average time it takes for goods to move through the logistics process, including the time when operations might be slower or paused over the weekend. We can see that also during the Cluster Analysis. Cluster 0, in which the delay is 7 days, has the highest count of observations.
Impact of the weekend: Including the weekends in the calculation of the logistics delay can artificially inflate the delay times. For instance, if goods are produced on a Friday, they might not move forward in the process until the following business day, Monday, adding several days to the logistics delay.
A possible alternative would be to implement or increase weekend operations. For example, some automated processes or partial shifts. This way, the impact of the non-working time during the weekend would be minimized.
Another alternative would be to ensure that goods are not produced on a Friday, that way ensures that the time it takes for goods to move through the logistics process will not be artificially inflated.
c. Visualize the distribution appropriately by displaying the histogram and density function using “plotly.” Describe how you selected the size of the bins¶
The bin size for a histogram is crucial because it affects the presentation of the data. In this particualr case, since the data is discrete - delays measured in whole days - a bin size of 1 (day) is appropriate. This way each delay day has its own bin, clearly highlighting the distribution of the delay time.
d) Describe the process for creating a decision tree to classify whether the component (K7) is defective (Fehlerhaft) or not. (Hint: Use visualizations.)¶
Step 1: Import Packages¶
Import packages that will be used for the creation of the tree: matplotlib.pyplot, sklearn etc
Step 2: Data Preparation¶
Merge Datasets¶
Combine Komponente_K7 - including the production details of K7 & Logistikverzug_K7 - which includes the logistics delay of K7
Handle Data¶
Examine the data, check for missing values and decide how to hanlde them (this can be either imputing mean/median or deleting rows with missing data)
Step 3 Visualize Data¶
Plot a correlation matrix to see how the different features (production date, logistics delay, Herstellernummer, Werknummer) interact with the defectiveness of K7. This can help select the most relevant features for the model.
In addition to that, a boxplot or a histogram can be used to determine the distribution of the data and get a deeper understanding.
Step 4 Define Features and Target Variable¶
Target Variable:¶
Is the componenent defective or not?
Features:¶
Determine the features that are most relevant for target variable - for example one of the primary features can be Logistics Delay, or a combination of the Logistics Delay and Herstellernummer.
Step 5: Splitting the data¶
Split the dataset into a training set and a testing set. The training set will be used to train the model and the test set to evaluate the performance of the decision tree. This split can have a balance 70% training and 30% testing.
Step 6: Building the decision tree¶
Use a decision tree classifier from the scikit-learn library. The decision tree algorithm will automatically determine the best splits in the data to classify components as defective or not.
Consider tuning hyperprarameters like: maximum depth of the tree, minimum samples per leaf.
Step 7: Evaluate the model¶
Confusion matrix: Evaluate the model using a confusion matrix to determine the number ofof true positives, false positives, true negatives, and false negatives.
Step 8: Visualize the decision tree¶
Visualize the tree. For example using plot_tree from scikit-learn
Step 9: Final model interpretation¶
Interprete the tree's structure
2. Data Storage in Separate Files¶
Explain why it makes sense to store the available data in separate files instead of saving everything in one large table. Name at least four benefits. The available tables represent a typical database structure. What is this structure called?¶
Data security:¶
When saved in separate files, sensitive data are better protected. For example, in case of data corruption or loss, the impact is limited to the affected files rather to the whole dataset.
Enhanced performance¶
Large tables can lead to slower performance and longer processing times. By creating separate files, queries can be executed faster and more efficiently, as the exposure to unecessery data is limited.
Data Integrity¶
When data are stored in smaller separate files, the complexity of managing and processing the data is reduced. This leads to fewer errors, which can help maintain accuracy and consistency.
Scalability and ease of integration.¶
As the data grows, separate files allows to scale more efficiently. New files can be added, or older ones can be removed, without having to mess with the whole system. Similarly, separate files are more easely integrated in different systems.
Easier maintenance and debugging¶
If there is an issue with the data, identifying and fixing the problem is easier when the data is organized into separate files. In addition to that, tasks such as cleaning, or updating can be performed quicker ,as one is focused on specific files instead of the whole data set.
The name of such a database structure is: Relational Database Structure¶
Task 3 : Determine how many parts T16 ended up in vehicles registered in Adelshofen.¶
Extract the unique column names (without .x or .y)¶
Initialize an empty DataFrame to store the combined columns¶
| Herstellernummer | Fehlerhaft_Fahrleistung | Fehlerhaft_Datum | Produktionsdatum | Werksnummer | ID_T16 | X1 | Fehlerhaft | |
|---|---|---|---|---|---|---|---|---|
| 1 | 212.0 | 0.0 | NaN | 2008-11-07 | 2121.0 | 16-212-2121-7 | 1 | 0.0 |
| 2 | 212.0 | 0.0 | NaN | 2008-11-08 | 2122.0 | 16-212-2122-41 | 2 | 0.0 |
| 3 | 212.0 | 0.0 | NaN | 2008-11-07 | 2121.0 | 16-212-2121-36 | 5 | 0.0 |
| 4 | 212.0 | 0.0 | NaN | 2008-11-07 | 2122.0 | 16-212-2122-20 | 10 | 0.0 |
| 5 | 212.0 | 0.0 | NaN | 2008-11-07 | 2122.0 | 16-212-2122-33 | 12 | 0.0 |
Herstellernummer 3 Fehlerhaft_Fahrleistung 426 Fehlerhaft_Datum 2899 Produktionsdatum 2914 Werksnummer 5 ID_T16 818844 X1 818844 Fehlerhaft 2 dtype: int64
Merge all the files¶
| Herstellernummer | Fehlerhaft_Fahrleistung | Fehlerhaft_Datum | Produktionsdatum | Werksnummer | ID_T16 | X1 | Fehlerhaft | ID_Komponente | IDNummer | Unnamed: 0 | Gemeinden | Zulassung | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 212.0 | 0.0 | NaN | 2008-11-07 | 2121.0 | 16-212-2121-7 | 1 | 0.0 | K2ST2-109-1092-2 | 22-2-21-1 | 2897615 | RIESA | 02-01-2009 |
| 1 | 212.0 | 0.0 | NaN | 2008-11-08 | 2122.0 | 16-212-2122-41 | 2 | 0.0 | K2ST2-109-1092-59 | 22-2-21-2 | 2897616 | GROEDITZ | 02-01-2009 |
| 2 | 212.0 | 0.0 | NaN | 2008-11-07 | 2121.0 | 16-212-2121-36 | 5 | 0.0 | K2ST2-109-1092-7 | 22-2-21-5 | 2897619 | NAUNHOF | 02-01-2009 |
| 3 | 212.0 | 0.0 | NaN | 2008-11-07 | 2122.0 | 16-212-2122-20 | 10 | 0.0 | K2ST2-109-1092-41 | 22-2-21-10 | 2897624 | BEESKOW | 02-01-2009 |
| 4 | 212.0 | 0.0 | NaN | 2008-11-07 | 2122.0 | 16-212-2122-33 | 12 | 0.0 | K2ST2-109-1092-67 | 22-2-21-12 | 2897626 | EISENHUETTENSTADT | 02-01-2009 |
Filter the DataFrame where Gemeinden is 'Adelshofen'¶
Count the number of unique ID_T16 values¶
Number of unique ID_T16 where Gemeinden = 'Adelshofen': 8
Task 4 : Identify the data types of the attributes in the registration table “Zulassungen_aller_Fahrzeuge.” Present your answers in a table integrated into your Markdown document and describe the characteristics of the data types.¶
Data Type Unnamed: 0 int64 IDNummer object Gemeinden object Zulassung object
Total entries in 'Gemeinden': 1048575 Unique values in 'Gemeinden': 5752
Converted 'Gemeinden' column to category type.
Data types after converting 'Gemeinden': Index int64 IDNummer object Gemeinden category Zulassung datetime64[ns] dtype: object
Data Type Characteristics Index int64 Integer data type, used for numeric data. IDNummer object Object data type, often used for text data or ... Gemeinden category Category data type, used for categorical data ... Zulassung datetime64[ns] Datetime data type, used for date and time inf...
Markdown Table: | | Data Type | Characteristics | |:----------|:---------------|:----------------------------------------------------------------| | Index | int64 | Integer data type, used for numeric data. | | IDNummer | object | Object data type, often used for text data or mixed data types. | | Gemeinden | category | Category data type, used for categorical data to save memory. | | Zulassung | datetime64[ns] | Datetime data type, used for date and time information. |
Data Types in "Zulassungen_aller_Fahrzeuge" ¶
Data Types and Characteristics¶
| Attribute | Data Type | Characteristics |
|---|---|---|
| Index | int64 |
Integer data type, used for numeric data. |
| IDNummer | object |
Object data type, often used for text data or mixed data types. |
| Gemeinden | category |
Category data type, used for categorical data to save memory. |
| Zulassung | datetime64[ns] |
Datetime data type, used for date and time information. |
Task 4 : Create a linear model from the table “Fahrzeuge_OEM1_Typ11_Fehleranalyse” relating mileage to suitable variables. Derive recommendations for OEM1 based on this model.¶
| X | ID_Fahrzeug | Herstellernummer | Werksnummer | Fehlerhaft_Datum | Fehlerhaft_Fahrleistung | days | fuel | engine | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 9 | 11-1-11-9 | 1 | 11 | 2010-03-16 | 34824.319559 | 1493.150761 | 4.003670 | small |
| 1 | 11 | 11-1-11-11 | 1 | 11 | 2010-03-16 | 74217.428309 | 1044.462231 | 11.042487 | large |
| 2 | 13 | 11-1-11-13 | 1 | 11 | 2010-03-16 | 32230.699639 | 749.669810 | 3.579117 | small |
| 3 | 15 | 11-1-11-15 | 1 | 11 | 2010-03-16 | 44885.783551 | 858.688003 | 4.666801 | small |
| 4 | 37 | 11-1-11-37 | 1 | 11 | 2010-03-17 | 86348.329866 | 1478.204174 | 4.634381 | small |
X int64 ID_Fahrzeug object Herstellernummer int64 Werksnummer int64 Fehlerhaft_Datum object Fehlerhaft_Fahrleistung float64 days float64 fuel float64 engine object dtype: object
| X | ID_Fahrzeug | Fehlerhaft_Datum | Fehlerhaft_Fahrleistung | days | fuel | Werksnummer_12 | engine_medium | engine_small | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 9 | 11-1-11-9 | 2010-03-16 | 34824.319559 | 1493.150761 | 4.003670 | False | False | True |
| 1 | 11 | 11-1-11-11 | 2010-03-16 | 74217.428309 | 1044.462231 | 11.042487 | False | False | False |
| 2 | 13 | 11-1-11-13 | 2010-03-16 | 32230.699639 | 749.669810 | 3.579117 | False | False | True |
| 3 | 15 | 11-1-11-15 | 2010-03-16 | 44885.783551 | 858.688003 | 4.666801 | False | False | True |
| 4 | 37 | 11-1-11-37 | 2010-03-17 | 86348.329866 | 1478.204174 | 4.634381 | False | False | True |
X int64 ID_Fahrzeug object Fehlerhaft_Datum object Fehlerhaft_Fahrleistung float64 days float64 fuel float64 Werksnummer_12 bool engine_medium bool engine_small bool dtype: object
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
{'Mean Squared Error': 123515188.0775618,
'R-squared': 0.4900147838613257,
'Coefficients': X -0.000054
fuel 5213.628779
engine_medium 7705.514824
engine_small 10954.392346
dtype: float64}Task 6 : On 11.08.2010, there was a hit-and-run accident. The license plate of the car involved is unknown. The police have asked for your assistance, as you work for the Federal Motor Transport Authority, to find out where the vehicle with body part number “K5-112-1122-79” was registered.¶
'12-1-12-82'
'ASCHERSLEBEN'
